Anonymizing Unstructured Data
نویسندگان
چکیده
In this paper we consider the problem of anonymizing datasets in which each individual is associated with a set of items that constitute private information about the individual. Illustrative datasets include market-basket datasets and search engine query logs. We formalize the notion of k-anonymity for set-valued data as a variant of the k-anonymity model for traditional relational datasets. We define an optimization problem that arises from this definition of anonymity and provide a constant factor approximation algorithm for the same. We evaluate our algorithms on the America Online query log dataset.
منابع مشابه
Anonymizing Unstructured Data to Prevent Privacy Leaks during Data Mining
In this information age, data becomes more and more important. A lot of data is stored in the cloud, this means that you are not really in control of the data and it might be anywhere. This leads to possible data leaks and therefore leads to privacy leaks. Recently the Dutch government has introduced a new law that makes it obligatory to report any data leaks that involve privacy sensitive data...
متن کاملAnonimytext: Anonimization of Unstructured Documents
The anonymization of unstructured texts is nowadays a task of great importance in several text mining applications. Medical records anonymization is needed both to preserve personal health information privacy and enable further data mining efforts. The described ANONYMITEXT system is designed to de identify sensible data from unstructured documents. It has been applied to Spanish clinical notes...
متن کاملCompromising Anonymity Using Packet Spinning
We present a novel attack targeting anonymizing systems. The attack involves placing a malicious relay node inside an anonymizing system and keeping legitimate nodes “busy.” We achieve this by creating circular circuits and injecting fraudulent packets, crafted in a way that will make them spin an arbitrary number of times inside our artificial loops. At the same time we inject a small number o...
متن کاملM-Partition Privacy Scheme to Anonymizing Set-Valued Data
In distributed databases there is an increasing need for sharing data that contain personal information. The existing system presented collaborative data publishing problem for anonymizing horizontally partitioned data at multiple data providers. M-privacy guarantees that anonymized data satisfies a given privacy constraint against any group of up to m colluding data providers. The heuristic al...
متن کاملAnonymization of Set-Valued Data via Top-Down, Local Generalization
Set-valued data, in which a set of values are associated with an individual, is common in databases ranging from market basket data, to medical databases of patients’ symptoms and behaviors, to query engine search logs. Anonymizing this data is important if we are to reconcile the conflicting demands arising from the desire to release the data for study and the desire to protect the privacy of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/0810.5582 شماره
صفحات -
تاریخ انتشار 2008